Extraction , Enhancement and OCR

نویسندگان

  • Huiping Li
  • David Doermann
  • Omid Kia
چکیده

In this paper we address the problem of text extraction, enhancement and recognition in digital video. Compared with optical character recognition (OCR) from document images, text extraction and recognition in digital video presents several new challenges. First, the text in video is often embedded in complex backgrounds, making text extraction and separation diicult. Second, image data contained in video frames is often digitized and/or subsampled at a much lower resolution than is typical for document images. As a result, most commercial OCR software can not recognize text extracted from video. We have implemented a hybrid wavelet/neural network segmenter to extract text regions and use a two stage enhancement scheme prior to recognition. First, we use Shannon interpolation to raise the image resolution, and second we postprocess the block with normal/inverse text classiication and adaptive thresholding. Experimental results show that our text extraction scheme can extract both scene text and graphical text robustly and reasonable OCR results are achieved after enhancement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Extraction and Character Recognition form Image using Mathematical Morphology and OCR Technique

Images contain various types of useful information that should be extracted whenever required and this information may be in the form of text present in image. Extraction of this information involves detection, localization, extraction, enhancement and recognition of the text from the given image. Mathematical morphology is the foundation of morphological image processing, which consists of a s...

متن کامل

A Literature Survey on Digital Image Processing Techniques in Character Recognition of Indian Languages

Handwritten character recognition is always a frontier area of research in the field of pattern recognition. There is a large demand for OCR on hand written documents in Image processing. Even though, sufficient studies have performed in foreign scripts like Arabic, Chinese and Japanese, only a very few work can be traced for handwritten character recognition mainly for the south Indian scripts...

متن کامل

Optical Character Recognition Systems

Abstract Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. This chapter presents the basic ideas of OCR needed for a better understanding of the book. The chapter starts with a brief background and history of OCR systems. Then the di...

متن کامل

Optical Character Recognition - IMPACT Best Practice Guide

Background and developments to date .................................................................................... 1 How OCR works ................................................................................................................ 4 Best Practice in the Use of OCR ........................................................................................... 6 Avoiding problems i...

متن کامل

Shirorekha Chopping Integrated Tesseract OCR Engine for Enhanced Hindi Language Recognition

Tesseract OCR Engine is one of the most efficient open source OCR engines currently available. Recently, Tesseract OCR 3.01 is capable of recognizing Hindi language but still it needs some enhancement to improve the performance. The Hindi language recognition accuracy is quite low even for the printed text, as the conjunct character combinations of Hindi Language are not easily separable due to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007